A LARGE DEVIATION APPROACH TO SOME TRANSPORTATION 

COST INEQUALITIES 



NATHAEL GOZLAN AND CHRISTIAN LEONARD 

Abstract. New transportation cost inequalities are derived by means of elementary 
large deviation reasonings. Their dual characterization is proved; this provides an exten- 
sion of a well-known result of S. Bobkov and F. Gotze. Their tensorization properties are 
investigated. Sufficient conditions (and necessary conditions too) for these inequalities 
are stated in terms of the integrability of the reference measure. Applying these results 
leads to new deviation results: concentration of measure and deviations of empirical 
processes. 



1. Introduction 

In the whole paper, X is a Polish space equipped with its Borel a-field. We denote V(X) 
the set of all probability measures on X. 

1.1. Transportation cost inequalities and concentration of measure. Let us first 
recall what transportation cost inequalites are and their well known consequences in terms 
of concentration of measure. 

Transportation cost. Let c : X x X — > [0, oo) be a measurable function on the product 
space X x X. For any couple of probability measures [i and v on X, the transportation 
cost (associated with the cost function c) of /i on v is 



7^(/i, v) = inf / c(x, y) Tc(dxdy) G [0, oo] 

where the inf is taken over all probability measures tc on X x X with first marginal 
7r(dx x X) — fi(dx) and second marginal n(X x dy) = v(dy). 

Tp-inequalities. Popular cost functions are c(x, y) = d(x, y) p where d is a metric on X 
and p > 1. It is known that for some /i G V(X) and p > 1 one can prove the following 
transportation cost inequality 

r dP (fi, v) llv < ^2CH(u | fj), Vz/ G P(X) (1.1) 

for some positive constant C, where H(u | /x) is the relative entropy of v with respect to 
\i defined by 

if v is absolutely continuous with respect to fi and H(y \ fi) = oo otherwise. In presence 
of the family of inequalities (|1.1|) . one says that \x satifies T P (C). 
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For instance, Csiszar-Kullback-Pinsker's inequality, see ([2.9)1 . is Ti(l) with the Hamming's 
metric d(x, y) = l x ^ y . Csiszar-Kullback-Pinsker's inequality is often called Pinsker's in- 
equality, it will be refered later as CKP inequality. It holds for any fi G V(X). On the 
other hand, T 2 -inequalities are much more difficult to obtain. It is shown in the articles 
by F. Otto and C. Villani and by S. Bobkov, I. Gentil and M. Ledoux jl], that if // 
satisfies the logarithmic Sobolev inequality, then it also satisfies T 2 . A standard example of 
probability measure /i that satisfies T 2 is the normal law. In |18j . M. Talagrand has given 
a proof of Ti{C') for the standard normal law not relying on any log-Sobolev inequality, 
for the sharp constant C = 1. 

Concentration of measure. As a consequence of Ti(C), K. Marton has 
obtained the following concentration inequality for \i : 



for all measurable subset A such that fi(A) > 1/2 and all r > \/2Clog2. Marton's 
concentration argument easily extends to more general situations. This is of considerable 
importance and justifies the search for Tj-inequalities. 

Product of measures. Suppose that fii, . . . , fi n satisfy respectively T p (Ci), . . . , T p (C n ). 
By means of a coupling argument which is also due to K. Marton (the so-called 
Marton's coupling argument), one can check that when p — 1, the product measure 
/Ji <g> • • • <g> n n satisfies T\{C\ + • • • + C n ), while when p = 2, /ii <g) • • • <8> \i n satisfies 
T 2 (max(Ci, . . . , C n )). In particular, if // satisfies 7i(C) then /x® n satisfies T\{nC). This 
inequality deteriorates as n grows. On the other hand, if // satisfies T 2 (C) then /i® n also 
satisfies T 2 (C) and this still holds for the infinite product fi® 00 . 

By Jensen's inequality, we have {Td) 2 < Td 2 so that T 2 (C) implies T\(C). As the standard 
normal law 7 satisfies T 2 (l), it is also shown in ^HJ that the standard normal law on W 1 : 
7™, satisfies T 2 (l) and therefore Ti(l) and the concentration inequality 



for all measurable subset A such that n{A) > 1/2 and all r > a/2 log 2 where is the 
Euclidean distance on M n . This concentration result holds for all n and is very close 
to the optimal concentration result obtained by means of isoperimetric arguments (see 
M. Ledoux's monograph [TT], Corollary 2.6) which is: 7 n ({x; d(x, A) > r}) < e -1 " 2 / 2 , for 
all r > 0. 

In view of (11-2)1 and of this optimal concentration inequality, it now appears that with 
X = W 1 , T\(C) implies that fi concentrates at least as a normal law with variance C. One 
may say that /1 performs a Gaussian concentration when p. 2)1 holds for some C. 

Criteria for T%. It has recently been proved by H. Djellout, A. Guillin and L. Wu in jH] 

that /i satisfies T\(C) for some C if and only if 




(1.2) 





(1.3) 
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for some a Q > and some (and therefore all) x a in X. It follows that il.ty is a charac- 
terization of the Gaussian concentration. The proof of this result in jS] relies on a dual 
characterization of 71 which has been obtained by S. Bobkov and F. Gotze in [2|. This 
characterization is the following: 71(C) holds if and only if 

log / e s{lp - M) dn < Cs 2 /2, (1.4) 
Jx 

for all s > and all bounded Lipschitz function (p with ||<£>||Lip < !■ 
The criterion (jl.3|) has been recovered very recently by F. Bolley and C. Villani in H where 
the relation between C and a is improved. This new proof relies on a strengthening of 
CKP inequality where weights are allowed in the total variation norm. For a statement 
of this strengthened CKP inequality, see Corollary 13.241 below. 

1.2. Presentation of the results. In this article, a larger class of transportation cost 
inequalities is investigated. It appears that the transportation cost inequalities T p de- 
fined by (jl.lj) enter the following larger class of inequalities, which will also be called 
transportation cost inequalities (TCIs): 

a(T c (fi, u)) < H(u | fi), \/u e V{X) (1.5) 

where a : [0, oo) — > [0, oo) is an increasing 1 function which vanishes at 0. The inequality 
dUTj corresponds c = dP with a(t) = t 2 / p /(2C), t > 0. Of course, one should rigorously 
restrict (jl.5|) to those v 6 V(X) such that T c (fi, v) is well-defined. 
The aim of this paper is threefold. 

(i) One proves TCIs by means of large deviation reasonings. The authors hope that 
this should provide a guideline for other functional inequalities. 

(ii) One obtains deviation results by means of TCIs. 

(iii) One extends already existing results, especially in the area of Ti-inequalities. 
One says that we have a Ti-inequality if 

a(T d (ji, v)) < H{u | fi), W G P d (X). (71) 

where d is a metric and Vd{X) is the set of all probability measures which integrate 

As regards item (i), it is no surprise that, because of the relative entropy entering TCIs, 
Sanov theorem plays a crucial role in our approach. Let 

1 n 
n 

i=i 

be the empirical measure of an n-iid sample (Xi) of the law fi 6 V(X). Sanov theorem 
states that the sequence {£„}„>! obeys the large deviation principle with rate function 
v i — > H[y | fi). The main idea is to control the deviations of the nonnegative random 
variables T c (fi, L n ) as n tends to infinity. An easy heuristic description of this program is 
displayed at Section 12.21 We obtain the 



In the whole paper, by an increasing function it is meant a nondecreasing function which may be 
constant on some intervals. 
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Recipe 1.6. Any increasing function a such that a(0) = and 

limsup-logP(T c (/i,L n ) > t) < -a(t) 
for all t > 0, satisfies the TCI (fT3J) . 

Rigorously, one will have to require that a is a left continuous function. This result will be 
proved at Theorem 17.11 and a weak version of it (with a convex) is proved at Proposition 
1531 

Not only TCIs can be derived with this recipe but also another class of functional in- 
equalities which we call Norm-Entropy Inequalities (NEIs), see f!2.6p for their definition. 
Let us only emphasize in this introductory section that Ti-inequalities are NEIs. 

As regards item (ii), concentration inequalities for general measures and deviation 
inequalities for empirical processes are derived by means of ^-inequalities at Section 

El 

As regards item (iii), the main technical (easy) result is Theorem 13 . 71 which is an extension 
of Bobkov and Gotze's characterization of T\{C) stated at (jl.4|) . It gives a dual charac- 
terization of all convex TCIs: those TCIs with a convex and increasing. Note that, up 
to the knowledge of the authors, all known TCIs are convex. As a consequence among 
others, one recovers the results of 0] about weighted CKP inequalities at Corollary 13.241 

Tensorization of convex TCIs is also handled. The main result on this topic is Theorem 
14.21 It states that if ai(T Cl (fii, ui)) < H{yi \ Hi) for all and a 2 (T C2 (fi 2 , ^2)) < H(u 2 \ n 2 ) 
for all z/ 2 , then ai^\a 2 {T Cl(BC2 (ni ® ^2, v)) < H{y \ Hi ® fi 2 ) for all v probability measure 
on the product space, where aiDo^ is the inf-convolution of ct\ and a 2 . 

Integral criteria are investigated in Section El It emerges from our analysis via large 
deviations, that integral criteria only control the behavior of a(t) in (jl.5|) for t away from 
zero. As a consequence, complete results are only derived for Ti-inequalities. It is also 
proved that the function a(t) of a Tx-inequality has a quadratic behavior for t near zero. 
The integral criterion for Ti is stated at Theorem 15.191 It is the following: 
Let d be a lower semicontinuous metric. Suppose that a > satisfies J x e ad ( Xo ' x ^ fx(dx) < 2 
for some x a G X and that 7 is an increasing convex function which satisfies 7(0) = and 
j x ei(' i ( xi ' x " n(dx) < B < 00 for some x\ G X, then 

a(t) = max ((Vat + 1 - l) 2 ,2 7 (t/2) - 2 log B^, t>0 

satisfies (Ti). 

Note that (y/at + 1 — l) 2 = a 2 t 2 /4 + o t ^ (t 2 ) is efficient for t near zero, while 2j(t/2) — 

2 log-B is efficient for t away from zero. 

This theorem extends the integral criterion (jl.3j) of jH] and 

The last Section [7| is devoted to abstract results. In particular, the extended version 
Recipe 12.81 of Recipe 11.61 is proved at Theorem 17.11 The authors hope that the set of 
abstract results stated in this section could be the starting point of the derivations of new 
functional inequalities. 
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2. Deriving T-inequalities by means of large deviations. Heuristics 
The dual equality associated with the primal minimization problem leading to T c (^i, v) is 



T c (/i, v) = sup < / ip dp, + / pdv\ (2.1) 
0/>,¥>)e$ c \Jx Jx J 

where $ c is the set of all couples (ip, ip) of Borel measurable bounded functions on X such 
that ip(x) + <p(y) < c(x,y) for all x,y G X. This result is known as Kantorovich duality 
theorem and it holds true provided that c is lower semicontinuous. It still holds if $ c is 
replaced by C& fl $ c which is the subset of all couples (ip, <p) G $ c of continuous bounded 
functions. In the special case where c = d is a lower semicontinuous metric, the above 
dual equality also holds with $^ the set of all couples (ip, (p) of measurable (or continuous 
as well) bounded functions such that ip = —<p and <p is a <i-Lipschitz function with a 
Lipschitz constant less than 1. In other words, 



Td(fi,v) = supjy <pd(v- fi)',(p e B(X), \\(p\\up < l| := \W- A* || Lip 



(2.2) 



where the space of all Borel measurable bounded functions on X is denoted B(X) and 
II V 9 II Lip — su Px^y ^°d(xy) V ^ 1S ^ e USU& 1 Lipschitz seminorm. This result, known as Kantorovich- 
Rubinstein's theorem, identifies the transportation cost Tz(/i, v) with the dual norm 

V II Lip- 

2.1. A larger class of transportation cost inequalities: T-inequalities. After these 
considerations, it appears that the transportation cost inequality (jl.ip enters the following 
larger class of inequalities, which we call T-inequalities: 

a(T(i/)) < H(u | fj), VveN (2.3) 

where a : [0, oo) — > [0, oo) is an increasing function which vanishes at 0, M is a subset of 
V(X) and T is defined by 



T(u)= sup < I $ dfi + <p dv \ (2.4) 
(il>,ip)e* {.Jx Jx J 

where $ is a class of couples of functions (ip, <p) with ip integrable with respect to \i and 
ip integrable with respect to v. Note that ()2.3|) is a family of inequalities where the value 
+oo is allowed with the convention that a(+oo) = lim^oo a(t). 



6 



NATHAEL GOZLAN AND CHRISTIAN LEONARD 



We are going to consider two cases which corresponds to what will be called Transportation 
Cost Inequalities and Norm-Entropy Inequalities. 

Transportation Cost Inequalities. We assume that c is a nonnegative lower semicon- 
tinuous cost function. The space of all continous bounded functions on X is denoted 
Cb(X). In the situation where <3> is equal to 

$ c := {(V, <p) e C b (X) x C b (X); i> © <p < c] 

the family of inequalities (|2.3jl is called a Transportation Cost Inequality (TCI). Indeed, 
the Kantorovich dual equality (j2.1j) states that 

T{u) =T c {fi,u) G [0,oo], 

for all v G M C V(X). In this situation, inequality (|2.3|) is 

a(T e (fi,v))<H(y\fi), W G A/" (2.5) 

Suppose that there exists a nonnegative measurable function x on A 1 such that c(x, y) < 
x(x) + x(y) for an x, y G X and J x x dji < oo. A natural set M is the set of all probability 
measures v such that j x xdv < oo. 

Norm-Entropy Inequalities. Let U be a set of measurable functions on X such that 
U = -U. Let us take $ = $[/ with 

®u ■= {(-<p,<p);<p e C/} 

This gives 

T(z/) = sup / </? d(f — /i) := \\u — fj,\\y G [0, oo]. 

In this case, inequality ()2.3|) is 

a(||^-//||^)<F(^|/i), VueVu (2.6) 

where T 7 !/ is the set of all v G V(X) such that \(p\ dv < oo for all tp & U. The family of 
inequalities ()2.6|) is called a Norm- Entropy Inequality (NEI). 

As a typical example, let (F, || ■ ||) be a seminormed space of measurable functions on X 
and U :— {(p G -F, ||<^|| < 1} its unit ball. Then, \\u — is the dual norm of || ■ ||. 

In the case where the cost function of a TCI is a lower semicontinuous metric d, the 
Kantorovich- Rubinstein theorem (see ([2 .2)) ) states that 

T d (fj,,u) = \\v-fiWlsp 

for all fi, v G V(X), where $[/ is built with F the space all bounded d-Lipschitz functions 
on X endowed with the seminorm || • ||li p . In this special important case, TCI and NEI 
match. 

2.2. Large deviations enter the game. At Sections H3 and [7[ T-inequalities will be 
proved by means of a large deviation approach. The integral functional H(- \ /i) will be 
interpreted as the rate function of the large deviation principle (LDP) of the sequence of 
the empirical measures 

1 n 
n *r~l 
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of an iid sample (Xf) of the law fi (5 X stands for the Dirac measure at x). Indeed, by 
Sanov's theorem {L n } obeys the LDP in V(X) with the rate function 

I(v) :=H(u | (j,), veM. 

Roughly speaking, the sequence of random variables {L n } obeys the LDP in M with the 
rate function I if one has the following collection of estimates 

P(L n G A) x exp[-n inf I(y)] 

as n tends to infinity, for any A "good" subset of N '. Let us introduce the nonnegative 
random variables 

T n = T(L n ), n>l. 

Suppose that T is regular enough for the sets A t = {u G N,T{y) > t}, t > 0, to be 
"good" sets. This means that for alH > 0, 

P(T n >t) = F(L n G A t ) x exp[-m(t)] 

with i(t) = mf{I(v),v G N,T{v) > t} G [0, oo]. Suppose that a is a deviation function 
for the sequence {T n } in the sense that it is an increasing nonnegative function on [0, oo) 
such that for all t > 

limsup - logP(T n >t)< -a(t). (2.7) 

n^oo Tl 

We obtain a(t) < i(t) for all t and in particular with t = T(u), we obtain for all v G J\f, 
a(T{v)) < i(T{y)) < I{y). This is precisely the desired inequality (|2.Hj) . 
The recipe is: 

Recipe 2.8. Any deviation function a of {T n } satisfies the T-inequality (|2.3j) . 

Because of the sup entering the definition of T n = sup^((ip, L n ) + (ip, /x)), one may expect 
to get into troubles when trying to prove a full LDP for {T n }. Fortunately, only the 
subclass of "deviation sets" A t = {v<E Af, T{u) >t},t> 0, will be really useful. 
This line of reasoning will be put on a solid ground at Theorem 13.71 Proposition 15.51 and 
Theorem 17.11 

2.3. An example: CKP inequality. As a simple illustration, we propose to prove CKP 
inequality by searching a deviation function a in the sense of (J2.7|) . This is not intended 
to be the shortest proof, but only an illustration of the proposed method. Recall that 
CKP inequality is 

^\\v-iif Ty <H{v\yL),VveV{X) (2.9) 
where ||£||tv is the total variation of the signed bounded measure £. As 



||£||tv — SU P \ \ V 9 ^; j V 9 measurable such that ||<p|| := sup \tp{x)\ < 1 

(I2.9J1 is the NEI with F = B(X) the space of bounded measurable functions furnished 
with the uniform norm ||<p|| := sup^g^ |<^(a;)|, Af = V(X) and a(t) = t 2 /2. 
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Consider an iid sample (A^) of the law [i and its associated sequence of empirical measures 
L n — h Ym=i -^ or a ^ n an d all G £/ = {yj G B(X); \\(p\\ < 1}, define the random 
variable 

TZ=b,L n -p) = \jy* 

i=l 

where Y? = (p(Xi) — E</?(Xj). Cramer's theorem states that {T%} obeys the LDP in R with 
rate function A* : the convex conjugate of the log-Laplace transform A ¥ ,(s) = \ogKe sYV , 
s£R. Recall that the convex conjugate of / is defined by f*(t) = sup sgK {st — f(s)} G 
(-00, 00], t G R. 

Sanov's theorem holds in V(X) with the weak topology a(V(X), B(X)). As, z/ G V(X) 1— > 
(<£>, v — \i) is a(V(X), -B(A"))-continuous for all <y? G B(X), one can apply the contraction 
principle. It gives us for all £ 

A;(t) = inf{#(z, I //); r/ G 7>(*) : (p, r, - //> = t}, 
which in turn implies that for all (p G 

A^« Vf 1/ - m» < I M), ^eV(X). 
As Y* takes its values in \EY V — l^WY* + 1], by Hoeffding's inequality we have 

A„(s) < s 2 /2 (2.10) 

for all real s. It follows that A* (t) > sup sgR {st — s 2 /2} = t 2 /2 for all real t. Hence, we 
have proved that for all if G U, 

a((<p, u-fi))< H(u I /i), Wu G V{X) 

with a(t) = t 2 /2. It follows that a(sup ¥ , g;7 ((^, v — /i)) < iJ(z/ | \i) for all f G V(X), which 
is CKP inequality Q - 

Some comments. In this proof, something interesting occured. Let us denote T n := 
sup^, /?(*) = -limsu Prwoo ±logP(T n > t) and J v (t) = - lim sup^ ± log P(T<f > t) 
the deviation functions of T n and . As T n > T% for all <p, we have /3 < inf^, J v . This 
means that a priori inf ^ could be too large to be the a of the NEI. On the other hand, 
by (gUil) : sup^ A v (s) < A(s) := s 2 /2 for all s > 0, so that t 2 /2 = A*(t) < hfi% J v (t). 
Nevertheless, we have shown that A* is a convenient function a for our NEI. 
It will shown in a more general setting, at Theorem 17. 7\ that the convex lower semicon- 
tinuous envelope of inf^, J v is the best increasing convex function a for this NEI. 

3. Convex T-inequalities. A dual characterization 

In the rest of the paper (except Section[7j) our attention is restricted to those T-inequalities 
(12. 3 j) where the function a is increasing and convex. In this case, ()2.3|) is said to be a 
convex T-inequality. 

3.1. Sanov's theorem. This theorem will be central for the proof of the main result of 
this section which is stated at Theorem 13.71 

Let the probability measure \i on X be given. We consider a sequence of independent X- 
valued random variables (-X"i)j>i identically distributed with law \i. For any n the empirical 
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measure of this sample is 

1 - 

n z — ' 

»=i 

We introduce the function space 

•7~ex P (/-0 = '■ X — > R; ip measurable, j exp(a|y?|) cf/j < oo for all a > ol (3.1) 

of all the functions which admit exponential moments of all orders with respect to the 
measure \i. We denote 

A/" cxp (/i) = \v e V(X); J \ip\ du < oo for all </? G .Fe X p(/i) j 

the set of all probability measures which integrate every function of J r exp (A i )- 

The set V(X) is furnished with the cylinder cr-field generated by the functions v \— > (if, u), 

Theorem 3.2 (A version of Sanov's theorem). The effective domain of H(- \ fi) is included 
in A/" cxp (A i ) and the sequence {L n } obeys the large deviation principle with rate function 
H(- | fi) in A4x P (/x) equipped with the weak topology cr(jV exp (//), J r e ^p{^))- 
This means that for all measurable subset A of jV exp (j«), we have 

liminf — logP(L n G A) > — inf H(u \ /i) and 

n^oo n uGint A 

lim sup — log P(L n G A) < - inf Hfulfi) 
where int A and cl A are the interior and closure of A. 

Proof. The proof is a variation of the classical proof of Sanov's theorem based on projective 
limits of LD systems (see [7j, Thm 6.2.10). For two distinct detailed proofs of the present 
theorem, see (jHj, Theorem 1.7) or (|12j. Corollary 3.3). □ 

3.2. The class of functions C. The functions a to be considered are assumed to be 
convex. Since a is also left continuous and increasing, we consider the following class of 
functions. 

Definition 3.3 (of C). The class C consists of all the functions a on [0, oo) which are 
convex increasing, left continuous with a(0) = 0. 

For any a belonging to the class C, denoting t* = sup{t > 0; a(t) < oo}, a is continuous 
on [0, £*) and lim^ a(t) = a(t*). 

The convex conjugate of a function a G C is replaced by the monotone conjugate a® 
defined by 

a®(s) = sup{st - a(t)}, s>0 
t>o 

where the supremum in taken on t > instead of t G R. In fact, if a is extended by a(t) = 

( f > « \^ ° then the usual eonvex eonjugate of 5 is 5%s) = ( f « « " 2 ° . 
[ if t < Jta I +°° if s < 

As a is convex and lower semicontinuous, we have a** = a. From this, it is not hard to 

deduce the following result. 
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Proposition 3.4. For any function a on [0, oo), we have 

(a) aeC oa® eC 

(b) aeC^ a®® = a. 

3.3. A convex criterion. Theorem 13 . 71 below is a criterion for a convex T-inequality to 
hold. It extends two well-known results of S. Bobkov and F. Gotze Theorem 1.3 and 
statement (1.7)). 

Let T be a vector space of measurable functions ip on X such that 



x 



dfi < oo, \/ipe T. (3.5) 

Let TV De the set of all probability measures which integrate T : 

TV = \v e P(^); y M dz/ < oo, G J 7 

Clearly, if the class $ entering the definition of T(y) satisfies 

(0,0) 6 $ C J x J, (3.6) 

the function T is a well defined [0, oo]-valued function on TV- 
Let A^(s) be the log-Laplace transform of + E^(X) where X admits /i as its law. 
We have for all real s, 

A^(s) = log / exp[s(y?(x) + (if),fi))]fi(dx) 



J x 

Theorem 3.7. We assume \3. 5)) and < U. 6)) . Let us consider the following statements 
where a is any function in C : 

(a) a(T(v)) < H{y \ fi), Vz/ G TV- 

(b) A^(s) < a®(s), Vs > 0, V0 G $. 

(c) a(t)<A;(t),Vt>O,V0G$. 

(d) limsu Pn _ 00 ilogP((^,L n ) + (iP,im) >t)< -a(t), Vt > 0, V(</>, y?) G $. 

(e) Wn > 1, ±logP((p,L n ) + {ip,fi) > t) < -a{t), Vt > 0, V(V, G $. 

T/ien, we nave (a) (6) <^ (c) and (e) =>■ (d) =>■ (a). 
7/ z't is assumed in addition that for all (ip, ip) G $, 



(y?(x) +^(x))/i(dx) < (3.8) 

A" 

then, we have (a) (6) <^ (c) <^ (d) -O- (e). 

The most useful statement of this theorem is the criterion (6) =^> (a). 
Clearly, the requirement ()3.8|) holds for all NEIs. It also holds for TCIs under the as- 
sumption that c satisfies 

c(x,x) = 0, Vx G A". (3.9) 
When working with TCIs, this will be assumed in the sequel. 
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Proof. Possibly considering the vector space T' spanned by J 7 U Cb(X) instead of J 7 , one 
can assume that T separates TV- Indeed, the assumptions (|3.5j) and (|3.6j) still hold with T' 
instead of T and we clearly have TV = TV- Hence, we assume without loss of generality 
that T separates TV- As a consequence, the weak topology cr(TV> J~) is Hausdorff: this is 
necessary to derive LDPs away from compactness troubles. 

Note that the assumption ()3.5|) is equivalent to T C T^ip). It follows that under this 
assumption, Sanov's Theorem 13 . 21 implies that {L n } obeys the LDP in TV equipped with 
cr(TV, F) with H{- \ fi) as its rate function. 
Consider, for any (if), <p) := 4> G $ and n > 1, 

1 n 

Tt = (<p,L n ) + ftM = - V(^(X 4 ) +E^(X l )) (3.10) 

i=i 

so that T n := T(L n ) = sup^^T^. Cramer's theorem states that {T^} obeys the LDP in 
R with 

A*Jt) = sup{st- A (s)}, t G R 
as its rate function. In particular, for all real t 

-infAi(u) < liminf ~logP(T? > t) 

u>t v n^oo n 

< limsup-logP(T,f > t) < -inf AJ(w) (3.11) 

Because of assumption ()3.6|) . the mapping f^-.uE TV i— > (<p, u) + (tp, fj) G R is continuous 
for every y?) G $. As Jjf = f^(L n ), one can apply the contraction principle which gives 
us for all real t 



A*Jt) = mf{H{v | fi); v G TV : (<p, v) + (V, //> = t}. (3.12) 



[(a) # (c)] 



& a (svv(((p,v) + < H{v | n),Vv G TV 

W- a((<p,u) + < H{u | /x),Vz/ G TV, V0 G $ 

^ ar(t) < H(u I /i),Vt G R,V0 G $,Vz/ G TV : (y, + (ip,fJ>) = t 
a(t) < M{H(u | fi);u e TV : (</?, + (ip,fi) = t},Vt G R,V0 G $ 

W a<\; 

* (c) 

The equivalence (i) follows from the definition (J2.4|) of T, (ii) holds true because a is 
increasing and left continuous while (iii) follows from (J3.12|) . 

[(b) -v=> (c)]. In order to work with usual convex conjugates instead of monotone conju- 
gates, let us take a*(s) = +oo for all s < 0. It follows that a is extended by a(t) = 0, for 
all i < and a®(s) = a*{s) for all s > 0. 

Let us prove (c) =>- (b). With the above convention, statement (c) is equivalent to 

a(t) < AJ(t),Vt G R,V0 G $. (3.13) 
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As, is convex and lower semicontinuous, we have: AJ* = A^. Hence, taking the convex 
conjugates on both sides of (j3.13j) one obtains that A^ < a* which entails (b). 
Let us prove (b) =>- (c). As a is in C, its extension (still denoted by a) is convex and lower 
semicontinuous, so that a** = a. Therefore, taking the conjugate of (b) leads to a < AJ 
which is (c). 

The convexity of a has been used to obtain (b) =^> (c) and it won't be used anywhere else. 

[(e) =>- (d) =>- (a)]. As (e) =^> (d) is obvious and (a) <^ (c), all we have to show is 
(d) =► (c). 

Let m = ~EY = (<p + ip, //). For all t < m, we have mi u>t A^(u) = mi u > t A^(u) = 0. As 
A^ is convex, it is continuous on (t_,t + ) the interior of its effective domain. Therefore, 
we have for all t ^ t+, mi u>t A^(u) = inf u > t A^(u). Together with (|3.1ip . this gives for all 

Consequently, considering T(t) = A®(t) if t t + and T(t + ) = +oo (if t + < oo), we have 

(d) =► a(t)<Aj(t), Vt^t + 
=> a<T 

Is a < Is T 
=> «<A® 

where Is a and Is T are the lower semicontinuous envelopes of a and T, and the last 
implication holds since a is lower semicontinuous and Is T = A®. As A® < A£, we have 
the desired result. 

[(a) -v^ (6) (c) -v=> (d) (e)]. Let us assume (j3.8)l . To obtain the stated series of 
equivalences, it remains to prove (c) =>- (e). 

By (Crmjl . T$ = J YJ with y; = ^(X,) + E^(Xi). The standard proof of the upper 
bound of Cramer's theorem is based on an optimization of a collection of exponential 
Markov inequalities, as follows. For all real t, all n and all s > 0, 

< P (exp[sJ2 Y i\ > ^ 
\ i=i 

n 

< e- ns *Eexp[s^y;] 

»=i 

= exp[n(A^(s) - st)] 

Optimizing on s > 0, one obtains that 

~logP(7* > t) < -A®(t), Vt G M,V0 G $,Vn > 1. 

But, assumption (|3~H)l implies that m < so that A? (t) = A£(t) for all t > 0. It follows 
immediately that (c) =>■ (e). This completes the proof of the theorem. □ 
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3.4. Convex Transportation Cost Inequalities. In the special case of TCIs, we have 
$ = $ c = {(ipj(p)',ip,(p G Cb(X) : ip © ip < c}. Optimal transportation theory (see 
[TH] ) indicates that $ c may be replaced with the smaller sets {(— ip, Q c <p); <p G Cb{X)} or 
{(— tp, Q c <p); <p lower semicontinuous and bounded on X} where 



without any change in the value of %. One easily proves that if ()3.9|) is satisfied: c(x, x) — 
for all x G X, then sup |Q c y?l < sup \ip\. If c is continuous, then Q c ip is measurable as an 
upper semicontinuous function. If c is only assumed to be lower semicontinuous, Q c ip is 
still measurable if (p is lower semicontinuous and bounded (but the proof of this result 
is technical). Anyway, Q c <p G B(X) (is a bounded measurable function) as soon as ip is 
lower semicontinuous and bounded. In particular, assumptions (j3.5j) and 1)3.6 j) hold with 
T =B{X). 

Now, as a corollary of Theorem 13.71 we have the following result. 

Corollary 3.14. Whenever a G C, the transportation cost inequality holds in M = 
V(X) if and only if 



for all s > and all if G Cb(X). 

If in addition c is continuous, the same result holds when ip G Cb(X) is replaced with 
ip G B{X) : the set of all measurable bounded functions on X. 

3.5. Convex Norm-Entropy inequalities. In the special case of NEIs, we have $ = 
{(— (p,(p);cp G U} and Theorem 13.71 specializes as follows. 

Theorem 3.15. Suppose that U satisfies 



for all s > and all <p G U. 

Specializing Theorem 13 . 1 51 by taking U to be the set of all 1-Lipschitz measurable bounded 
functions with respect some measurable metric d, one obtains the following characteriza- 
tion of convex Tx-inequalities. 

Theorem 3.17 (Ti-inequality). Let d be a lower semicontinuous metric on X such that 



Q c <p(y) = inf '{<p(x) + c(x,y)}, y G X 






(3.16) 




for some a Q > and some (and therefore all) x Q G X . Let a be in C. Then, 

a(T d (ji,u)) < H{v | (i), 
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for all v e V{X) such that j x d(x Q , x) v(dx) < oo if and only if 

A v (s) := log / e sM*)-Ml ^dx) < a®(s) (3.18) 
Jx 

for all s > and all measurable bounded Lipschitz function ip such that ||y||Lip < 1- 



The following simple result asserts that the functions a of NEIs cannot grow faster than 
at 2 for t near zero. 

Proposition 3.19. Assuming that F contains functions which are not p-a.e. constant, 
the function a of a convex norm- entropy inequality \2. 6)) satisfies 

< a{t) < at 2 , VO <t<tt (3.20) 

for some a > and t\ > 0. 

Proof. Let <p be a non constant function in U. Then, a 2 := f x (ip(x) — ((p, fi)) 2 dp, > and 
for any < a\ < o 2 there exists s\ > such that A Vo (s) = o- 2 s 2 /2 + o(s 2 ) > a\s 2 /2, for 
all < s < Si. Let #i(s) match with a\s 2 /2 on [0, si] and be extended on [s 1; oo) by the 
tangent affine function of s > a 2 s 2 /2 at s — S\. As A ipo is convex, we have 9i(s) < A lfio (s) 
for all s > 0. 

Together with (j3.16j) . we obtain 9\ < a®. Taking the monotone conjugates on both sides 
of this inequality provides us with 

t 2 /(2a 2 ), ifO <t< Sl aj 
+oo, if t > s\a\ 

from which the desired result follows. □ 



a(t) < 6®(t) 



To explore some consequences of Theorem 13.151 (see Corollaries 13.231 and 13.241 below) one 
needs the notion of Orlicz space associated with the exponential function. It appears that 
the space ^ exp (/ i ) introduced at (|3.1|) is the Orlicz space 

ip : X — > R; measurable, / p(aip) dp, < oo for all a > 

Jx 

where //-almost equal functions are not identified and p is the Young function 

p( s ) = e |s| - 1, s E R. 

Its Orlicz norm is defined by 



IMI, := inf |&>0;jT d /i<lJ ( 3 - 21 ) 

= inf |& > 0; J e M/b dp<2^ 

and considering the usual dual bracket (i], <p) = J x rjip dp, its topological dual space is 
isomorphic to 

L p *(p) = |r] : X — > R; measurable, J p*{arf) dp < oo for some a > Oj 
1] : X — > R; measurable, / |^| log |ry| dp < oo 



x 
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where p* is the convex conjugate of p : 

m _ r 1*1 log 1*1 - 1*1 + 1, if 1*1 > i 

9 W ~ \ 0, if |*| < 1 

and /x- almost equal functions are identified. Note that the effective domain of H(- \ p) 
is included in the set of all probability measures v which are absolutely continuous with 
respect to p and such that j 1 G L p *(p). 

Let us state a useful technical lemma, which will play a role that is similar to the role 
that Hoeffding's inequality (j2.1U|) played during the proof of CKP inequality. 

Lemma 3.22 (A Bernstein type inequality). For any measurable function p> such that 
f x e a °^ dp < oo for some a Q > 0, we have \\(p\\ p < oo and 



A y (s) < 7,| , V0<s<l/|M 

1 - IWWnS 



It follows that, if U is a uniformfy \\ ■ \\ p -bounded set of functions: sup^g^ ||<£>|| p < M < oo, 
then 

M 2 s 2 

K{s)<- o-, V0< S <l/M,V^Gf/. 

1 — Ms 

Proof. By the definition of (3 := \\<p\\ p , we have 1 > j x p(p/ (3) dp = ^ k> i(\p>\ k , p) / {k\(3 k ). 
Therefore, for all k > 1, (\p>\ k ,p) < k\f3 k . It follows that for all s > 0, 



= log [l + ^S k {p\p)/k\ \ -8((p,fl) 
V k>l ) 

k>2 



k>2 



k>2 

((3sf/(l- /3s), if < /?s < 1 
+oo, if (3s > 1 

The last statement holds since (3 i— > ^2k>2(@ s ) k is an increasing function, for all s > 0. □ 

We are now ready to prove some corollaries of Theorem 13.71 
For any measurable function / in L p *(p), let 

ll/H* := sup | J fp dp] p : measurable, \\ip\\ p < 1 

= sup ij fp dp] p : measurable, j e'^' dp < 2 
be the dual norm of || ■ || p . 
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Corollary 3.23. For any probability measure v which is absolutely continuous with respect 
to /i and such that ^ e L p *(n), we have 

dv 
dfi 



< 2y/H(u | n) + H{u | fi). 

p 

2 



Note that this is the NEI: ati(\\f- - 1||*) < H(y | fj), with on(t) = (y/t + 1 - If 

Proof. Here U is the unit ball of T exp {fi) and thanks to Lemma f3. 221 applied with M — 1, 
(13. 16(1 holds as follows: A ¥ ,(s) < cef(s) := s 2 /(l — s). Taking the monotone conjugate, we 
obtain a\(t) = (\/t + 1 — l) 2 , which is the desired result. □ 

The following corollary has already been obtained by F. Bolley and C. Villani in 0] with 
other constants. 

Corollary 3.24 (Weighted CKP inequalities). Let x be a nonnegative function such that 
f x e a ° x dfi < oo for some a Q > 0. Then, \\x\\p < 00 an d f or an U probability measure v 
which is absolutely continuous with respect to fi and such that e L p *(/x), [y — //) ||tv 
is well defined, finite and we have 

\\X ■ {v ~ /i)||TV < llxllp (2 VH(v I fi) + H{v I //)) 

iVote t/iat to zs tfie AE/: a(||x- (*>-/*) ||tv) < H{u \ n), with a(t) = (\/t/\\x\\ P + 1 - l) 2 . 

Proof. Here {7 = {;c0> SU P 1^1 < !}• As x ma Y n °t be in ^ r e x P (A*) (if there exists ai > 
such that f x e aiX dfi = 00), one must be careful. It happens that 



X • (v — )u)||tv = sup < / d(z/ — /i) ;■?/>: measurable, sup |?/>| < 1 



.v 



sup |y <i(z/ — fj,);<p : measurable, \(p\ < x, sup \cp\ < 00 j 



To show this, decompose v — \x into its positive and negative parts, approximate from 
below xlV ; |lsupp((y-^) + ) and xl'0|lsupp((i>-/i)_) by pointwise converging sequences of bounded 
functions, and conclude with the dominated convergence theorem. 

Therefore, U can be replaced with U' = < x, sup < 00} c J r cxp (fi). As 

su P^ei/' WfWp < llxll/j) thanks to Lemma T3.22I applied with M = \\x\\p, (|3.16|) holds as 
follows: A^(s) < a^-(s) := (Ms) 2 /(1 — Ms). Taking the monotone conjugate, we obtain 
®M(t) = (y/t/M + 1 — l) 2 , which is the desired result. □ 

Remark 3.25. It follows from Corollaries 13.231 and 13.24^ that 

\W-h\\tv < ^ (2y/H(v I fi) + H(v I /if 



which of course is worse than CKP inequality ()2.9|) but has the same order of growth yH 
for vanishing entropies. 

Let d be a metric on X . The associated dual Lipschitz norm of any signed bounded 
measure £ with zero mass is defined by 



\\t\\ 



Lip 



sup I J if d£; ip : measurable, ||<^|| Lip < 1, sup |<^| < oo| 
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where ||<£>||Lip = sup,^ ~^^) is the usual Lipschitz seminorm. 

Corollary 3.26. Suppose that there exist a Q > andx Q G X such that j x e a ° d ( x °< x ) ^(dx) < 
oo. Then, \\d\\ PtP m = inf{6 > 0; f XxX e d( - x ^ b fi{dx)fi{dy) < 2} < oo and 

\W ~ aC p < W d l^ (?y/H{v | fi) + E{v | //)) , Vi/ G 7>(*). 
iVote i/iai t/ws zs i/ie NET. a(\\u — yU, || L ip ) < H(y | /i), wrat/i a(t) = {\Jt/\\d\\ p ^m + 1 — l) 2 . 

Proof. This is a corollary of Theorem 13.171 Here £/ = {ip : H^Hijp < l,sup \ip\ < oo} C 
•^exp(^)- Let us show that 

sup \\<p - (y>, //) || p < \\d\\ p ^m. (3.27) 
By Jensen's inequality, for any 1-Lipschitz function if and all s > 0, 



exp 



s y>(ar) - / ip(y) fx(dy) 



x 



< / exp[s((p(x) - (p(y))] /i(dy) 



x 



< / exp[sd(x,y)}fi(dy). 
Jx 



Hence, integrating with respect to fi(dx), one obtains ()3.27|) . 

Thanks to Lemma T3.22I applied with M = \\d\\ P:P m, ()3.18|) holds as follows: A v ,(s) < 
Oifj(s) := {Ms) 2 /{I— Ms). Taking the monotone conjugate, we obtain ctM{t) = {s/tjM + 1 — 
l) 2 , which is the desired result. □ 

4. Tensorization of convex TCIs 

In this section only convex TCIs are considered. It is assumed that the appearing state 
spaces are Polish and the appearing cost functions are nonnegative continuous and satisfy 

(EH). 

4.1. Statement of the main result. Let fi 2 be two probability measures on two 
Polish spaces X x , X 2 , respectively. The cost functions ci(xi,yi) and c 2 (£2,2/2) on X\ x X\ 
and X 2 x X 2 give rise to the optimal transportation cost functions T Cl (fii, vi), vi G V{X\) 
and T C2 (n 2 , ^2), ^2 G V(X 2 ). 

On the product space X\ x X 2 , we now consider the product measure fii ® fi 2 and the cost 
function 

Ci © c 2 ((xi,?/i), (£2,2/2)) := ci(xi,yi) +c 2 (x2,y 2 ), a?i,2/i e <*i,a;2,2/2 e A 2 
which give rise to the so-called tensorized optimal transportation cost function 

%H®et{p\ ® A*2, f), ^ G X A3). 

Recall that the inf-convolution of two functions a\ and 0:2 on [0, 00) is defined by 

aiDa 2 (t) =inf{ai(ti)+a 2 (t2);ti,*2 >0:ti+t 2 = *}, * > 0. 
Lemma 4.1. Lei ai and a 2 belong to the class C. Then, 

(a) «iD« 2 G C and 

(b) (aiDa 2 )® = af + a® 

Proof. This simple exercice is left to the reader. □ 
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The main result of this section is the following theorem. 

Theorem 4.2 (Tensorization). Let ci andc 2 be two continuous nonnegative cost functions 
which satisfy Suppose that the convex TCIs 

^(^(/n,^)) < H{v x | m), Wi G V(Xi) 
a 2 (T C2 (/i 2 , v 2 )) < H(u 2 | n 2 ), W 2 G V{X 2 ) 
hold with a\,a 2 G C. Then, on the product space X\ x X 2 , we have the convex TCI 
aiDa 2 (7; iffiC2 (//i ® ji 2 ,f)) < H(y \ p\ ® fi 2 ), W G V{X\ x X 2 ) 

Its proof is postponed to Section l4~3l We prefer beginning with a presentation at the next 
section of an incomplete derivation of this result which, to our opinion, seems to be more 
intuitively appealing. 

4.2. An incomplete direct proof of Theorem 14. 21 By means of Marton's coupling 
argument ^3], one can expect to prove the next Proposition 14.31 We are interested in 
transportation costs from X\ to 3^1? from X 2 to y 2 and from X\ x X 2 to x y 2 . 
For any probability measure v on the product space y = ^ x y 2 , let us write the 
desintegration of v (conditional expectation) as follows: u{dy\dy 2 ) = vxidyxjv^ (dy 2 ) . 

Proposition 4.3. For all u G Vtyx x y 2 ), 

%x®cA^\ ®a*2,^) < + / % 2 (i j '2,v 2 1 )M d yi)- ( 4 - 4 ) 

Recall that the relative entropy satisfies for all v G Vtyi x y 2 ), 

H{v I /ii ® /i 2 ) = #(1/1 I /ii) + / i7(z/f I n 2 ) vi(d yi ) (4.5) 
which looks like (|4.4jl . 

Admitting Proposition 14.31 for a while, one can easily derive Theorem 14.21 as follows. Take 
= X t and ^2 = X 2 . For all 1/ G V{X X x X 2 ), 

aiDa 2 (T ciec2 (/ii ® /i 2 , v)) 

(o) / f V 

< a 1 Ua 2 [T Cl {iii,vi)+ I T C2 (/i 2 , vf) u 1 (dy i/ 

(b) ( f 

< a 1 (% 1 (ni,v 1 ))+a2yj T C2 (ii 2 ,iy^)u 1 (dy li 



< a 1 (T Cl (fii,u 1 ))+ a 2 (T C2 ( y u 2 ,z/| 1 )) 



= | yUl <g> /i 2 ). 

Inequality (a) holds thanks to Proposition I4.3I since a.\U\a 2 is increasing, (b) follows from 
the very definition of the inf-convolution, (c) follows from Jensen's inequality since a 2 
is convex, (d) follows from the assumptions a\{Ti{vi)) < H[y\ \ \x\) for all V\ and 
a 2 (T 2 (v 2 )) < H{y 2 \ /j, 2 ) for all v 2 (with obvious notations) and the last equality is ()4.5|) . 
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To complete the proof of Theorem 14.21 it remains to prove Proposition 14.31 This won't 
be achieved completely: a difficult measurability statement will only be conjectured. 

Incomplete proof of Proposition \4 ■ 3[ One first faces a nightmare of notations. It might be 
helpful to introduce random variables and see it G V{X x y) = x X 2 x 3r x 3^2) as the 
law of (X U X 2 ,Y 1} Y 2 ). One denotes tt x = C(X U Y X ), ir 2 im C(X 2 ,Y 2 \ X x = x 1 ,Y 1 = Vl ), 
tt^ 1 = C{X 2 I Xt = x 1 ,Y 1 = yi ), ir% m = C{Y 2 \ X, = x 1 ,Y 1 = y x ) ir x = C(X 1 ,X 2 ), 
7Ty = C(Yi, Y 2 ) and so on. 

Let us denote v) the set of all n G V(X x y) such that tt x = H and tty = v, Pii.Hi, v \) 
the set of all 77 G V[X\ x 3 ; i) such that r] Xl = Hi and r] Yl = V\ and P 2 (fi 2 , v 2 ) the set of 
all 7] G V(X 2 x 3^2) such that r\x 2 — H2 and r\ Y2 = v 2 . 
We only consider couplings tt such that under the law n 

• C(X 1 ,X 2 ) = t i, 

• C{Y x ,Y 2 ) = v, 

• Yi and X 2 are independent conditionally on X x and 

• Xi and Y 2 are independent conditionally on Y\. 

Optimizing over this collection of couplings leads us to 

T c {^v)< inf / d® c 2 {x 1 ,y 1 ,x 2 ,y 2 )ir 1 (dx 1 dy 1 )iT 2 1 ' yi (dx 2 dy 2 ) 

7Tl,7r| J 



where the infimum is taken over all tt\ G Px(hi-> ^1) and all tt. 



si ,3/1 . 



X! G G ^i) 



such that 7T2 1,yi G P^fi^ , p y V) for ^-almost every As fi is a tensor product: 

H — Hi ® H2, we have Hx 2 = TTi-a.e. so that vr^ 1 ^ 1 G i^G^ ^y 2 ) f° r TTi-almost every 
0i,2/i)- 

Not being careful, one may write 



%s(ji, V) 



< 



(a) 



inf 

7T1 ,7r| 

inf 

TTl 

inf 

7T1 

inf 
inf 

7T1 



ci © c 2 (xi, yi, x 2 , 1/2) 7ri(dxidyi)7T2 1,J/1 (dx 2 dy 2 ) 



C\ dui + 



aixiVi 



inf 

<> 
■2 



C2{x2,y2)^T ,Vl {dx 2 dy 2 ) ^{dxxdyx) 



x 2 xy 2 



c\ dui + 



c 2 dr 2 im 



TX\{dx\dy\ 



c\ du\ + 



Afix^i 



x x xVi \Jx 2 xy 2 

T C2 (h2,^y 2 ) K 1 (dx 1 dy 1 



c\ du\ 



yi 



Jyi 



y 2 ) "i(dyi 



fx 2 xy 2 c z d7T 2 1,Vl 



which is the desired result. 

On the right-hand side of equality (a), tt^ 1 ^ 1 is a minimizer of ir 2 1,yi 
subject to the constraint tt^ 1 ^ 1 G P 2 (/j> 2 , ^y 2 )- The general theory of optimal transportation 
insures that such a minimizer exists for each (xi, y±). And it might seem that the work is 
done. 
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But this is not true since one still has to prove that there exists a measurable mapping 
(xi, yi) i — > 7?f 1,yi . We now face a difficult problem that may possibly be solved by means 
of a measurable selection theorem, taking advantage of the pleasant property of tightness 
of any probability measure on a Polish space. 

We withdraw this promising direct approach. □ 

4.3. A complete indirect proof of Theorem 14.21 It is based upon an indirect dual 
approach, making use of the characterization of Corollary 13.141 and follows the line of 
proof of ([H], Proposition 1.19). 



Proof of Theorem \4-£\ Recall that, provided that c is continuous nonnegative and satisfy 
(13. 9|) . Q c ip(x) = miy £ x{f(y) + c(y,x)} is in B(X) whenever <p G B(X). We denote 
Qi = Q C1 , Q2 = Q C2 and Q = Q C1 ® C2 . 

By Corollary 13.141 the convex TCIs u ai(Ti) < Hi" and u a 2 (T 2 ) < H 2 n which are supp- 
posed to hold are equivalent to 



e^^d/n = exp(af(s) +s(9 1 ,m)), Vs > 0, V0i G B(X X ) (4.6) 
e sQ ^dfi 2 = exp(a®(s) +s(# 2 , y u 2 )), Vs > 0, V6> 2 G B(X 2 ) (4.7) 

! x 2 

As by Lemma 14.11 (aiDa 2 )® = af + a®, thanks to Corollary 13.141 again, all we have to 
prove is 

e* Q *'d(^i<8)^2) = exp(af+af(s)+s(^/ii®/X2)), Vs > 0, Vy> G C h {X x x X 2 ) (4.8) 

Let us take ip G Cb^Xx x X 2 ). For all (x x ,x 2 ) G X\ x X 2 , 

Qip(x 1 ,x 2 ) = inf {(p(y u y 2 ) +c 1 (yi,x 1 ) + c 2 (y 2 ,x 2 )} 

yi£Xi,y2&Y2 



where 



in t { in t {^(2/1,1/2) +c 2 (y 2l x 2 )} + c l (y 1 ,x 1 ) 
in t {0x 2 (yi) +ci(yi,x 1 )} 
Qi9 X2 {xi) 



Qx 2 {yi) = Q2^p yi (x 2 ) = inf {(p(yi,y 2 ) + c 2 (y 2 ,x 2 )} (4.9) 

y 2 &2 



with <p yi (y 2 ) := ip(y\,y 2 ). Hence, for all s > 0, 



e 

X±xX 2 J X 2 \J X\ 

1 af(s)+s{e X2 ,fi 1 ) 



< / e <*iW+'V*2>M n 2 (dx 2 ) 
Jx 2 

— e a i (s) expfs/ Q 2 <f yi (x 2 ) ^(dyx) J n 2 {dx 2 ) 

J X 2 \ J X\ / 

Equality (a) is justified since if being bounded, (xx,x 2 ) t— > Qip(x\,x 2 ) = Qi9 X2 (x\) is 
jointly measurable. 
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Let us now prove the inequality (b). As <p and c are continuous, (x 2 ,yi) i— > X2 (yi) is 
jointly upper semicontinuous as the infimum of a collection of continuous functions. Since 

0*2(2/1) = Q2<P yi (%2) by flUH), we have sup^ ^ | ^2(2/1) I < su Pj/i SU P \fyi I = SU P M < 00 • 
Therefore, (X2, j/i) 1— > 9 X2 (yi) is an upper semicontinuous bounded function. Consequently, 
one is allowed to invoke (|4.fi|) to obtain e^ 1 ^^) /xi(ofei) < e Q f ( s )+ s (^ 2 ^i) for all x 2 . 
Also note that x 2 1— > (9 X2 ,fii) is measurable since (£2,2/1) 1— > 8 X2 (yi) is jointly measurable 
and bounded. 

The last equality (c) is simply (j4.9|) . 



Remark 4.10. If c 2 is only assumed to be lower semicontinuous, the joint measurability 
of (x2,yi) 1— > 9 X2 (yx) which has been used to prove inequality (b) is far from being clear. 
This is the reason why the cost functions are supposed to be continuous. 

But for all x 2 , 

QiWvxfa) P\( d V\) = / in L {^(2/1,2/2) + c 2 (y 2 , x 2 )} Hi(dyi) 

X, JXi ^ ey 2 



- in l <P(.Vi> 2/2) m(dyi) + c 2 (y 2 ,x 2 ) 

V2&2 lJ Xl 
= Q2^{X 2 ) 

where y 2 1— > Tp(y 2 ) = f x (f(yi, 2/2) ^i(dyi) is a continuous bounded function. Gathering 
our partial results leads us, for all s > 0, to the inequality (a) below 



I, 



e sQv d{iii®ii 2 ) < e a ® {s) [ e sQ ^dfi 2 

Jx 2 



X1XX2 J X-x 

— e a f (s)+af (s)+s(ip,fi 1 ®n 2 ) 

Inequality (b) is a consequence of (|4.7|) . This is (|4.8|) and concludes the proof of the 
theorem. □ 

4.4. Product of n spaces. The extension of Theorem 14.21 to the product of n spaces is 
as follows. Let X%, . . . , X n be n Polish spaces and /ii, . . . , fi n be probability measures on 
each of these spaces. On each space Xi let q be a cost function. The cost function on the 
product space X 1 x • • • x X n is 

ci © • • • © c„,((xi, . . . ,x n ), (y u . . .,y n )) = ci(x x ,yi) H h c n (x n ,y n ) 

Corollary 4.11. Let us assume that the cost functions C{ are nonnegative continuous and 
satisfy \3. 9j) . Suppose that the convex transportation cost inequalities 

ai(T Ci (ni, Vi)) < H(vi I ft), W; G V{Xi), i = l,...,n 

hold with Q!i, • • • , a n G C. Then, on the product space X\ x • ■ ■ x X n , we have the convex 
transportation cost inequality 

oetD ■ ■ ■ Da n (T Cl(B ... (BCn (fi 1 © • • ■ © fi n , v)) < E{y \ fjL X © • - • © fi n ), W G V(X 1 x • ■ • x X n ) 
where 

Da n (t) = inf{a x (ti) + ■ • • + a n (t n ); t u t n > :*! + ••- + t n = t}, t>0 



na 
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is the inf- convolution of ax, ... , a n . 

Proof. It is a direct consequence of Theorem 14 . 21 which is proved by induction, noting that 
cuiD • ■ ■ Oa n = (cuiD ■ ■ • □a n _i)Do! n for all n. □ 

In the special situation where the n TCIs are copies of a unique TCI on a Polish space X 
we have the following important result. 

Theorem 4.12. Let us assume that the cost function c is nonnegative continuous and 
satisfy hS.ty . Suppose that the convex transportation cost inequality 

a(T B {ji,v))<H{y\n), eV(X) 

holds with a G C. Then, on the product space X n , we have the following convex trans- 
portation cost inequality 

where c ffin ((xi, . . .,x n ), (yi, . . . ,y n )) = c(x 1 ,y 1 ) H h c(x n ,y n ). 

Proof. This is a direct application of Corollary 14.111 noting that a nn (t) = na(t/n). □ 

About dimension-free tensorized convex TCIs. Let us say that a convex trans- 
portation cost inequality 

a(T c (ji,u))<H(y\n), W G V(X) (4.13) 
has the dimension-free tensorization property, if the inequality 

a (T c en(//® n ,C)) < H(( | ^ n ), VC G V{X n ) 

holds for all n G N*. 

Clearly, according to Theorem I4.12[ if a G C is of the form a(t) = at with a > 0, then 
(I4.13|) has the dimension-free tensorization property. 

Remark 4.14. Thanks to the same theorem, a seemingly weaker sufficient condition on a 
for ()4.13|) to be dimension-free is a(t) < inf n >i na(t/n), t > 0. As a is in C, a(t)/t is an 
increasing function so that a'(0) := \im t i a(t)/t exists. It follows that lim^^oo na(t/n) = 
a'(0)t for all t > 0. Therefore, the condition a(t) < inf n >i na(t/n), t > is equivalent 
to a(t) < a'(0)t, t > 0. But since a is convex, the converse inequality also holds, that is 
a(t) > a'(0)t, t > 0. Consequently a is of the form a(t) = at with a > 0. 

Dimension free tensorization is a phenomenon that can only happen when dealing with 
non-metric cost functions. Indeed, we show in the following proposition, that convex 
Ti-inequalities having this property are all trivial. 

Proposition 4.15. Let (X, d) be a Polish space and // G V(X). The convex transportation 
cost inequality 

a(T d {ii,v))<H{v\ii), WeV{X), (4.16) 

with a G C has the dimension free tensorization property if, and only if a = or fi is a 
Dirac mass. 
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Proof. If a = 0, it is clear that ()4.16|) has the dimension free tensorization property. If fi 
is a Dirac mass, it is easy to see that (|4.16|) holds for every a G C. Noting that a tensor 
product of Dirac measures is again a Dirac measure, the dimension-free tensorization 
property is established in this special case. 

Now, suppose that (j4.16|) has the dimension-free tensorization property, with a ^ and 
let us prove that fi is a Dirac mass. According to Theorem 13.171 the following inequality 

log f e »(v(*0+-+(p(*n)-«W» ^(dx! . . . dx n ) < a®{s), Vs > 

holds for all bounded 1-Lipschitz ip and all n > 1. As a consequence, denoting by A v 
the Log-Laplace of <p(X) — (y?,/i), X of law /i, one has A 9 < -a®, for all n > 1, and so 
A^ < on dom a® (the effective domain of a®). But by Jensen inequality, one obtains 
immediately A^ > 0. Thus A^ = on dom a®. As a ^ 0, [0, a[c dom a®, for some 
a > 0. Considering —tp instead of (p in the above reasoning yields that A^ = on ] — a, a[. 
This easily implies that (the image of [i under the application (p) is a Dirac mass. Now, 
let us take a point xo in the support of \i and consider the bounded 1-Lipschitz function 
(po(x) = d(x,x ) A 1, x G X. As Xq is in the support of /i, /^([O, e[) = /i(<^o < e) > for 
all e > 0. As //^ is a Dirac mass, one thus has p,(<po < e) = 1 for all e > 0. This easily 
implies that /x = S xo . □ 



5. Integral criteria 

Our aim in this section is to give integral criteria for a convex T-inequality to hold. 
Let us first note that when two T-inequalities otQ{T{v)) < H{y \ /j), W G H and 
ai(T(u)) < H(y | ft), \/v G H hold, then we have the resulting new inequality a(T(v)) < 
H{y | fj), Vv G M with 

a = max(a , «i). (5.1) 

This allows us to separate our investigation into two parts: obtaining and a\ which 
control respectively the small (neighbourhood of t — 0) and large values of t (the other 
ones). Let us go on with some vocabulary. 

5.1. Transportation functions and deviation functions. We introduce the following 
definitions. Recall that T is defined at (|2.4|) . 

Definition 5.2 (Transportation function). A left continuous increasing function a : 
[0, oo) — > [0, oo] is called a transportation function for T in Af if 

a(T(u))<H(u\fi), Vz/GA/". 

This means that the T-inequality (|2.3J) holds with a. 

Definition 5.3 (Deviation function). A left continuous increasing function a : [0, oo) — > 
[0, oo] is called a deviation function for T if 

limsup -logP(T(L n ) > t) < -a(t), Wt > 0. 

These functions will be shortly called later transportation and deviation functions, without 
any reference to T and H . 
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Remark 5.4. For T(L n ) to be measurable, it is assumed that $ is a set of couples of 
continuous functions. Indeed, 

g ^ t ( -i>* ) ^ 4 = n l x g ^ ^e^) + gm < 4 

\ n i=l / J ^e* I U i=l J 

is a closed set. 

Note that an increasing function is left continuous if and only if it is lower semicontinuous. 
Clearly, the best transportation function is the left continuous version of the increasing 
function 

t i-> \ni{H(v | n);veJ\f, T(v) >t},t> 0. 

Similarly, the best deviation function is the left continuous version of the increasing func- 
tion 

t ^ -limsup-logP(T(L n ) > t) e [0,oo],t > 0. 

n. — ton n 



Proposition 5.5. Under the assumptions of Theorem 3.1, any deviation function a in 
the class C is a transportation function. 

Proof. Let a G C be a deviation function. Since T(L n ) > T% for all <p G $, we clearly 
have P(T(L n ) > t) > F(T* > t) for all t > and n. Therefore, for all (p,n and £, 
limsup^^MogP^ >t)< limsup n ^ 00 ilogP(T(L n ) > t) < -a(t). This implies the 
statement (d) of Theorem 13 .71 which in turn is equivalent to the statement (a) of Theorem 
13.71 which is the desired result. □ 

5.2. Controlling the large values of t. In this subsection, it is assumed that the 
deviation and transportation functions are in C. 

Proposition 5.6. The first statement is concerned with convex TCIs and the second one 
with convex T -inequalities. 

(a) If (3 G C satisfies J x exp[/3( f x c(x, y) fi(dy))} fi(dx) < A < oo then 

a(t) = max(0,/3(t) - log A)), t > 

is a transportation function. 

(b) Let us suppose that a is a transportation function, then for all (ip, ip) G $ 

exp [5a Up(x) + (ip, fj))] fi(dx) < 1 + - < oo, V0 < 5 < 1. 
x ' 1 - o 

Remarks. 

• In (a), because of Jensen's inequality, one can take A > J x2 exp f3(c(x, y)) fi(dx)fi(dy) 

• About (a), if c = d < D < oo is a lower semicontinuous bounded metric, one 

recovers that a(t) = i if £ > Z) * S a ^ rans P or ^ a ^ on function, which is 

obvious. 

• About (b) in the case of a TCI, let us note that sup(^^\ e$c ((^(x) + (ip,fi)) < 
f x sup^(v?(a;) + ip(y)) (i(dy) < j x c(x, y) fi(dy) for all x. It follows that 

J x exp [5a ((<p(x) + (ip, /-t)))] n(dx) < f x exp [5a (J x c(x, y) fi(dy))~\ fi(dx) for all 
(ip, (p) G It would be pleasant to obtain the finiteness of an integral in terms of 
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c. In the case where c(x, y) = d(x, y) p , this will be performed below at Corollary 

EH 

Proof. Let us prove (a). As the product measure fi(dx)L n (dy) has the right marginal 
measures, we get: %(/i,L n ) := T n < j x2 c(x,y)fi(dx)L n (dy) = (c M , L n ) with c^(y) : = 
J x c(x, y) fi(dx). It follows that for all t > 0, 

P(T n >t) < P((c M ,L n )>t) 

( = } F(i3({c„L n ))>i3(t)) 



< e -«/3(*)]E e E™ = i/3oc M (X,) 

where equality (a) follows from the monotony of j3, (b) from the convexity of (3 and 
Jensen's inequality, (c) from the monotony of the exponential, (d) from Markov's inequal- 
ity and (e) from the fact that (X;) is an iid sequence. Finally, 

limsup - logP(T n >t)< -Pit) + log / eP™* dfi, Vt > 

which with Proposition 15.51 leads to the desired result. 

Let us prove (b). As a G C is a transportation function, by Theorem 13.71 (keeping the 
notations of Theorem 13 .7|) we have for all 

a(t) < A;(t), V0G $,Vt>0. 

By Lemma [5.71 below, as A*, is the Cramer transform of <p(X) + (ip,fj) we get 



Eexp [5A;( l p(X) + ^,ii))]< l 



1 + 5 



5 



V0 < 5 < 1, 



Extending a with a(t) = for all t < 0, we obtain a < for all 0, a < A£. Consequently 
we obtain 

exp [<fa(^(x) + (V>,/^))] M<^) < ^4,V0 < 5 < 1,V0 



1-5' 

As e 5 " is increasing, the desired result follows by monotone convergence. □ 
During the above proof, the following lemma has been used. 

Lemma 5.7. Let Z be a real random variable such that Ee A °' z ' < oo for some X a > 0. 
Let h be its Cramer transform. Then for all < 5 < 1, Kexp[5h(Z)] < (1 + 5)/(l — 5). 

Proof. This result with the upper bound 2/(1 — 5) instead of (1 + 5)/ (1—5) can be found 
in (jjj, Lemma 5.1.14). For a proof of the improvement with (1 + 5)/(l — 5) see ^Uj- D 

Corollary 5.8. In t/tis statement d is a lower semicontinuous semimetric and c is a lower 
semicontinuous cost function such that c(x, x) — for all x G X . 
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(a) Suppose that there exists a nonnegative measurable function \ su °h that 

c<x®X- 

Let 7 G C be such that f x exp[7 o x( x )] V>(dx) < B < oo, then for any x Q G X 
t ^ 2max(0,27(t/4) - 7 o x (x ) -log 5), t > 

zs a transportation function for c. 

(b) Suppose that there exists 9 G C suc/i i/iai 

0(d) < c. 

If a & C is a transportation function for c, then 

exp[w a; o 9(d(x , x)/2)] fi(dx) < 00 

/or all x a E X and all < u < 2. 
Proof. We begin with the case where c = d, x( x ) = d(x a , x) and 0(d) = d. 

The case c = d. To prove (a) with x(x) = d(x Q ,x), we apply statement (a) of Proposition 
15.61 Let P be in the class C. We have for all x Q G X 



x 



exp 



x 



P[ / d(x,y) fi(dy)) fj,(dx) < 



x 



X 2 



exp[/3(d(x, y))] n(dx)n(dy) 



< 



< 



exp 



A' 2 







2d(x Q ,x) + 2d(x Q ,y) 



n(dx)n(dy) 



A 2 



exp[/3(2d(x ,x))/2 + /3(2d(x , y))/2] fx(dx)n(dy) 





/3(2d(x , x)) 


/ exp 
fx 


2 



fi(dx] 



A 



(5.9) 



Taking, 7$ = (3{2t)/2, one gets A = 5 2 and 

t ^ max(0, (3{t) - log A) = max(0, 2 7 (t/2) - 2 log B) 
is a transportation function for c = d. 

Now, let us prove (b). Thanks to Kantorovich- Rubinstein equality (|2.2j) one can take 
$ = {(—<p,<p); H^Hiip < l,y? bounded}. Because of Proposition I5.6l -(b). we have for all 
bounded if with || V 5 II Lip — 1 : 

exp[8a(ip{x) - (ip,ii))]ii{dx) < (1 + S)/(l - 5), VO <<?<!. 



A' 



The function <^(x) = d(x G , x) is 1-Lipschitz but it is not bounded in general. Let us 
introduce an approximation procedure. For all k > 0, with m := j x d(x ,y) fi(dy), we 
have 



exp[<5a((d(x , x) A k) — m] /i(dx) < 



A' 



exp 



A 



5a((d(x D , x) A fc) — / [d(x D , y) A fe] /i(dy) 



x 



fi(dx) 



< (1 + ,5)/(i -tf). 
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By monotone convergence, one concludes that for all < 8 < 1, 

exp[5a(d(x a , x) — m] n(dx) < (1 + S)/(l — 5). 



25a(d(x D ,x)/2) = 25a 



x 

As 

d(x , x) — m m 
2 + ~2 
< 5[a(d(x Q , x) — m) + a(m)], 

one sees that 

exp[25a(d(x ,x)/2)]n(dx) < e Sa{m) [ exp[5a(d(x Q , x)-m\ n(dx) < e Sa{m) (1+5)/ (1-5) 
x J x 

which leads to 

/ exp [25a(d(x Q , x)/2)] fi(dx) < oo (5.10) 
Jx 

The general case. Let us prove (a). It is clear that c(x,y) < d x (x,y) where d x is the 
semimetric defined by 

d x (x,y) = l XJ L y (x(x) + x(y))- (5.H) 

Remark 5.12. If x admits two or more zeros, d x is a semimetric. Otherwise it is a 
metric. In the often studied case where c = d p with d a metric and p > 1, one takes 
x(x) = 2 p ~ 1 d(x Q ,x) p (see the proof of Corollary 15. 141 below) and d x is a metric. 

Of course, for all v G Af = V x = {u G V(X)\ f x x(x) u(dx) < oo}, we have 

T c (v) < T dx (v). 

Therefore, any transportation function for d x is a transportation function for c. This easy 
but powerful trick is borrowed from the monograph by C. Villani ( [T^j, Proposition 7.10). 
It has been proved at ()5.9|) that if f x exp[j3(d x (x , x)} n(dx) < C < oo for some function 
(3 G C, then max(0, 2f3(t/2) — 21ogC) is a transportation function for d x . 
Taking (3(t) = 2j(t/2), with convexity we have 

(3(d x (x D , x)) < 7 o x (x ) +70 x (x) (5.13) 

so that j x exp[/3(d x (x ,x)]fi(dx) < e^^B = C. This leads us to max(0, 20(t/2) - 
2 log C) = 2 max(0, 2"y(t/4) — 70 x(x D ) — log B) which is the desired result. 

Let us prove (b). Because of Jensen's inequality, it is easy to show that 9(T d ) < T c . As a 
is a transportation function for c, it follows that a o 9 is a transportation function for 7^. 
Applying the already proved result (|5.10|) with a o 6 instead of a completes the proof of 
the corollary. □ 

Now, we consider an important special case of convex TCI. 

Corollary 5.14 (c = d p ). In this statement c = d p where d is a lower semicontinuous 
metric and p > 1. 

(a) Let 7 G C be such that J x exp[j(d p (x ,y))} n(dy) < B < 00 for some x Q G X, then 
t ^ max(0, 27(2~ p t) - 21og£), t>0 
is a transportation function. 
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(b) If a G C is a transportation function, then 

exp[w a(2~ p d p (x , x))] fi(dx) < oo 
for all x Q G X and all < u < 2. 



x 



Proof. This is Corollary 15.81 with x( x ) = 2 p ~ 1 d p (x ,x), 8(d) = d p and the following 
improvement in the treatment of the inequality ()5.13|) . One can write f3(d x (x 0) x)) < 
7 ° x( x o) + 7 ° x( x ) =7° x{ x ) since 7 o x( x o) — in this situation. As a consequence 
max(0, 2j(2~ p t) — 21og-B) is a transportation function, which is a little better than its 
counterpart in Corollary 15.81 □ 

Remark 5.15. It is known that the standard Gaussian measure /ionR satisfies T 2 which 
is the TCI with c(x,y) — (x — y) 2 and the transportation function a(t) = t/2 (see [T5]). 
As a consequence of Corollary 15.141 b. for all p > 2, there is no function a in C except 
a = which is a transportation function for the standard Gaussian measure and the cost 
function \x — y\ 



v 



5.3. Controlling the small values of t. We are going to prove a general result for the 
behaviour of a transportation function in the neighbourhood of zero. By a general result, 
it is meant that /i is not specified. As a consequence, it will only be shown that under the 
assumption that c < x®X where f x e SoX dfi < 00 for some 5 Q > 0, there are tranportation 
functions which are larger than some quadratic function around zero. Obtaining better 
results in this direction is difficult and requires more stringent restrictions on the reference 
probability measure /i. 

Proposition 5.16. Let c be a cost function satisfying \3. 9\) and c < \ © X f or some 
nonnegative measurable function x satisfying f x e SoX d\i < 00 for some 5 Q > 0. Then, 
\\x\\p is finite and 



is a transportation function for c and fi. 

In particular, for all a > such that f x e ax d^i < 2, 1 1— > (y/at + 1 — l) 2 is a transportation 
function. 



Note that (y/at + 1 - l) 2 = a 2 t 2 /4 + o t ^ Q {f) = at - 2\/at + 2 + o^oo(l). 
The Orlicz norm ||x||p is defined at (|3.21|) . 

Proof. Because of our assumptions, we have T c < T dx , see (15.11)1 . Hence, it is enough to 
show that a Q is a transportation function for d x . But this follows from Lemma \5 . 1 71 below 
and Corollary 13.241 

The last statement follows from a simple manipulation on the definition of the Orlicz 
norm □ 

The following lemma has been used in the previous proof. 

Lemma 5.17. For all \i and v in V x := {u G V(X); ff x \dv < 00}, we have 

Td.i^v) = \\X- (p- ^)||tv- 
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Proof. By Kantorovich- Rubinstein's equality (|2.2j) . we have T dx (fi,v) = sxvp{f x (p d(v — 
//); <y9 G B(X), \\<p\\up < 1} where IMIup < 1 is equivalent to |</?(x) — <p(y)\ < d x (x, y) for all 
x, y. One can prove without trouble (see [TU]) that this is equivalent to \ip(x)— a\ < x(x), Vx 
for some real a. Therefore, 

^(A^f) = sup {/ ¥d(i> - n);ip G : M < x| 

= sup sup | (xAk)9d{v-n);0eB{X):\6\<l 

= Hx-Ou-^IItv 

which is the desired result. □ 

5.4. An application: Tx-inequalities. A ^-inequality is a TCI with c = d. Let us 
denote Vd(X) = {u G V(X); f x d(x*,x) vidx) < oo for some (and therefore all) G X}. 
Suppose that /i is in Vd{X). The function a is said to satisfy the Ti-inequality for d and 
(jl if 

a(T d (n, v)) < H{y \ fi), Vu G V d {X). (5.18) 

Theorem 5.19 (Ti-inequalities). Let d be a lower semicontinuous metric. Suppose 
that a > satisfies j x e ad ^ x °^ n(dx) < 2 for some x a G X and that 7 G C satisfies 
Ix e 7 ^ 9 ' 1 '*^ jj(dx) < B < 00 /or some Xi G , i/jen 

a(t) = max ^(Vat + 1 - l) 2 ,27(t/2) - 2 log t > 

satisfies \5.1ty) . 

Conversely, if a function a in the class C satisfies A5.1ty) . then 



exp[u a(d(x*,x)/2)] fj,(dx) < oo 
for all x* G X and all < u < 2. 



x 



Proof. Gathering Corollary 15.141 -a. Proposition 15.161 and the trick (15. 1|) gives us the first 
statement. The second statement is a particular instance of Corollary 15. 141 -b. □ 

Note that by Proposition 13.191 we know that it is impossible that a escapes from a qua- 
dratic growth at the origin. 

Theorem 15.191 extends the integral criteria for the usual Ti(C)-inequality in [8 j and j3j. 
Nevertheless, the control of the constant C is handled more carefully in these cited papers. 
In a forthcoming paper (see the PhD manuscript JD]), one of the author has obtained 
the following result which is very much in the spirit of |H] and [3]. 

Theorem 5.20. Suppose that c(x,y) = d p (x,y), that a satisfies for some a > 

and that a® is unbounded on its effective domain. Then, the following statements are 
equivalent : 

• There exists b\ > such that a (bi%p>(is, fi)) < H{y\jj) for all v G V(X) such that 
J x d p (x Q , x) n(dx) < 00 

• There exists 62 > such that JJ x2 e a ( b2dP ( x ' y » fj l (dx)fj l (dy) < +00. 

Further details concerning the relation between b\ and 62 can be found in [Tu] . 
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6. Some applications: concentration of measure and deviations of 

empirical processes 

In this section, we give some applications of Ti-inequalities. The first application, The- 
orem 16.31 is an easy extension of a well known result of K. Marton. The second one, 
Theorem 16.101 is more original and concerns the deviations of empirical processes. 
In the whole section, d is a metric on X which turns (X, d) into a Polish space. 

6.1. A basic lemma. Theorem 16 . 31 and Theorem 16. lOl both rely on the following elemen- 
tary lemma. 

Lemma 6.1. Let fi G V(X) be such that J x d(x Q ,x) fi(dx) < +oo, for all xq G X, and 
suppose that the T\- inequality 

a(T d (ji,v))<H(u\fj), WeV(X), 

holds. Then, for all 1-Lipschitz function <p, one has 

fx > ((p >f j) +t) < e- a(t \ \ft>0. (6.2) 

Proof. Let (p a 1-Lipschitz function. For every n > 1, let us consider ip n = ip V n A —n. 
According to point b. of Theorem 13.171 one has 



A Vn (s) := log / e s{lpn - { ^ )] dfi < a®{s), Vs > 0. 
Jx 

By dominating convergence, ((p n , jj) > (^li 1 )- Thus by Fatou's lemma, one has 

n— >+oo 

A v (s) := log / e'b-b'M d\i < a®{s), Vs > 0. 
Jx 

Now, thanks to Chebychev argument, one has for all t > : 

H (<f > (<f, n)+t)< inf / e ^-M-t) dfi < inf = e -a(t)_ 

s>0 Jx s>0 



□ 



6.2. Ti-inequalities and concentration of measure. Let us recall that for a given 
probability measure ii on a Polish space X, the concentration function of \x is defined by 

M (r) = sup{l - Li(A r ) : A borel set such that li(A) > 1/2}, Vr > 0, 

where 

A r := {x G X : d(x, A) < r}. 
One says that 9 is a concentration function for /x, if there is r > such that 

9„{r) < 9(r), Vr > r , 

or equivalently 

fu(A r ) > 1 - 6(r), Vr > r , VA Borel set. 

Roughly speaking, the following theorem states that if a is a Ti-transportation function 
for n then e~ a is a concentration function for fi. This link between transportation cost 
inequality and concentration inequality was first noticed by K. Marton, see ^H]- Her 
result extends as follows. 
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Theorem 6.3. Let fi G V(X) be such that J x d(x Q ,x) fi(dx) < +oo for all x G X , and 
suppose that the Ti-inequality 

a(T d (fi,u))<H(u\fi), yueV(X), 

holds with an unbounded a G C. Then for all measurable A with fi(A) > 0, one has the 
following concentration of measure inequality : 

fi(A r ) > 1 - e' a{r - TA \ Vr > r A , (6.4) 

where r A '■= log fi(A)). 

The following proof is different from Marton's original argument. Our proof is based on 
deviation arguments while Marton's one is based on transportation. For a proof using 
Marton's concentration arguments see Proposition VI. 81 in |lUj . 

Proof. The function x i— > d(x,A) is 1-Lipschitz. Thus, according to Lemma f6. 11 

fi{d{-,A) >t + (d(-,A),fi)) < e~ a{t \ Vt > 0. 

In order to derive (|6.4|) . the only thing to do is to show that (d(-, A), ft) < log fi(A)). 

Let v G V(X) be such that v(A) = 1. According to the ^-inequality satisfied by fi, one 
has 

/ d(-,A)dfi= / d(-,A)dfi- / d(-,A)du<T d (fi,u)<a~ 1 (H{u\fi)). 
J X J x Jx 

Thus, 

(d(-,A),fi) < a' 1 (inf {H(v \ fi) : v{A) = 1}) . 
Let fiA G T(X) be defined by dfiA = jjjijdfi ; clearly ha(A) = 1, so 

inf {H(v | fi) : v(A) = 1} < H(fi A \ fi). (6.5) 
An easy computation yields H(fiA \ fi) = — log fi(A). □ 

Note that d(-,A) is unbounded so that the inequality J x d(-,A)dfi — f x d(-,A)db> < 
Td(fi, v) needs to be justified. Let ir be a probability on X 2 with marginals fi and u, 
then J x d(-,A) dfi — J x d(-,A) dv = ff x2 d(x, A) — d(y, A) 7r(dxdy) < ff x2 d(x, y) 7r(dxdy). 
Optimizing in n leads to the desired result. 

Some comments. In Marton's approach, the probability measure fiA plays also a great 
role. Thanks to our approach, this role can be further explained. The choice of fiA is 
optimal in the sense that (|6.5|) holds with equality: 

inf {H(u | fi) : v(A) = 1} = H(fi A | ft). (6.6) 

In other words, fiA is Csiszar's /-projection of fi on {u G V(X) : v(A) = 1}, see jSHH]- 
If v is such that v{A) = 1, one has 

H{v | fi) = Hiv | fi A ) + / log^du 

Jx dfi 

= H(u\fi A )+ / logl A ^-log//(^4) 
= i/(z/ I /i^) + H(fi A | /i), 
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where the last equality follows from J ^logl a^v = and H(/j,a \ jj) = — log/i(A). This 
proves (|6.6|) . 

6.3. Tx-inequalities and deviations bounds for empirical processes. Lemma 16.11 
together with the tensorization property of Theorem 14. 121 immediately implies the follow- 
ing 

Lemma 6.7. Let fi G V(X) be such that J x d(x Q ,x) fi(dx) < +oo, for all Xq G X, and 
suppose that the Ti-inequality 

a(T d (fi,u))<H(u\fi), y„ev(X), 



holds. Then for all function Z : X n 
d® n , one has 



which is 1/ n-Lipschitz with respect to the metric 



// 



[Z > (ji,Z) +t) < e~ na(t \ Vt>0 



(6.* 



Let us consider a class Q of 1-Lipschitz functions on X, and X{ an iid sample of law /i. 
Let Z% be defined by 



Z% := sup 



{I^I>M-X^I} 



(6.9) 



As < Z% = sup^gg { | f x (p dL n — J x (p dfi\ } < T d (L n , /x), one has G [0, +oo[. Further, 
as a supremum of 1/n-Lipschitz functions, the function 



(xi , . . . , x n ) 



sup 



is 1/n-Lipschitz too. This implies in particular that Z^ is measurable. The random 
variable Z 1 ^ is called an empirical process. Applying Lemma T6.7[ one immediately obtains 
the following theorem. 

Theorem 6.10. Let /x G V(X) be such that J x d(x a , x) fi(dx) < +oo, for all x G X , and 
suppose that the Ti-inequality 

a(T d (jjL,u))<H(u\fi), VveV(X), 

holds. If Q is a class of 1-Lipschitz functions on X then the empirical process Z% defined 
by \6. y\) satisfies the following inequality 

P (Z% > E [Z%\ + t) < e~ na{t \ Vt > 0. (6.11) 

The literature about the deviations of empirical processes is huge. For a good overview 

of this subject, one can read P. Massart's Saint-Flour lecture notes [To] . 

Now, if (X, || • ||) is a Banach space, and /x G V(X) such that J x \\x\\ d\i < +oo then 

taking Q = {£ G X* : \\£\\x* = 1}, where X* is the topological dual space of X, one 

obtains 



n 

n 

i=l 



x d/i 



x 



where f x xfi(dx) is well defined in the Bochner sense. In this special case, we have the 
following result. 
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Theorem 6.12. Let fi e V(X) be such that f x \\x\\ fi(dx) < +00, and suppose that the 
^-inequality 

a (7f|. || < H(v I /i),Vz/ G V{X), 

holds. If Xi is an iid sequence of law \i, then letting Z n = l-J^ILi^i — J*^ x <i/x| | , one 
has 

P (Z n > E [Z n ] +t)< e~ na{t \ Vt > 0. (6.13) 

Remark 6.14. In order to obtain precise deviations results for (resp. Z n ), one must 
be able to estimate the term E [Z%] (resp. E [Z n ]). 

Let us give some examples. 

Example 1. Quantitative versions of Sanov theorem. Suppose that Q is the set of 

all bounded 1-Lipschitz functions on X, then Z% = T d (L n ,fi), see (12. 2|) . 
The following theorem is Theorem 10.2.1 of [Tj\ (volume II). 

Theorem 6.15. Let fi be a probability measure on M. q (equipped with its usual euclidean 
norm \\ ■ \\ 2 ) such that 

c := / \\x\\l +5 dfi < +00. (6.16) 



Then, there is D > depending only on c and q, such that 

E[T d2 (L n ,fx)]<Dn-^, (6.17) 
where d 2 is the metric associated to || ■ H2. 

Thanks to this result, one obtains the following quantitative version of Sanov theorem : 
Corollary 6.18. Let /i be a probability on M. q , satisfying \6.1b}) and the T\-inequality 

a{T d2 (ti,v))<H(v\fi), Vz/GP(M"), 



where d 2 is the usual euclidean metric on M 9 . Then, the following inequality holds : 

q+i 

t) • 



F (T d2 (L n , n) > t) < exp ( -not [ t - \ Vt > 0, Vn > (— \ 

ni+ 4 J J 



where D is the constant of \6.1T\j . 

In j^j, F. Bolley, A. Guillin and C. Villani have also obtained a quantitative version of 
Sanov theorem with alternative arguments. 

Example 2. Deviations bounds for empirical means. Let X be a Banach space 
and consider 

n 



Z n 



1 f 
— Xi — / xd[i 



(6.19) 



where Xi is an iid sequence of law /i. In order to control the term E[Z n ], a classical 
assumption is to require that X is of type p > 1, ie there is b > such that for every 
sequence (Yfii of centered random variables with E [||l^|| p ] < +oo, one has 

E[\\Y 1 + --- + Y n \\ p )<b[E[\\Y 1 \\ p ) + ..- + E [\\Y n \\ p ]) . (6.20) 
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If X is of type p and E [||Xi|| p ] < +00, then one can deduce immediately from (|6.20j) the 
following control: 

E[Z n ] < 1 (BEOI^-E^IIH) 1 /'. (6.21) 



n 



Controls like (j6.21j) can be used in Theorem 16.121 to derive precise deviations bounds for 
empirical means. Let us conclude this section with a concrete example. 

Theorem 6.22. Let \i be a probability measure on a Banach space (X, \\ ■ ||) such that 
f x e a ^ fx(dx) < +00, for some S > 0. Then, for all sequence Xi of iid random variables 
with law \i, one has 

P (Z n > E[Z n ) +t)< e _n (v /T +^^ 1 ) , vt > 0, (6.23) 

f \\ x ~ y\\ 1 

where Z n is defined by \b\19jl and M := inf < b > : jj x2 e 1 fi(dx)[i(dy) < 2 > . 

Proof. According to Corollary 13.261 ji satisfy the Ti-inequality 

a (Tjj. || (ji,v)) < H(u I fi), VveV(X), 

with a(t) = ^-y/l + jj — lj • Thus, applying Theorem I6.12| the result follows immedi- 
ately. □ 

Inequality ()6.23)1 is very close to a well known inequality by Yurinskii ([2H], Theorem 
2.1). Under the same assumptions on fi, one can easily derive from Yurinskii's result the 
following bound : 

/ 1 nt 2 \ 

P(Z„>E[Z„]+t)<exp(- § 2M , + tM J , Vi>0, (6.24) 
where M = inf j& > : f x fx(dx) < 2^. To compare (J6.23)) and (|6.24j) first note that 

(this is left to the reader). Next, let us show that 

M < 2M . (6.26) 

This follows from the following inequality : 

\\x-v\\ (0 / f Mi \ 2 (») f Ml (*") 

e 2M o n(dx)n(dy) < / e 2M o //(<fa) < / e M o ^(dx) < 2, 

where (i) comes from the triangle inequality, (ii) from Jensen inequality and (iii) from the 
definition of Mq. Thanks to ()6.25|) and ()6.26|) . one obtains 

T V t 2 t 2 t 2 

1 + ^-1 > T^>T7TT^ ^^TT> 



M J ~ 2(2M 2 + tM) ~ 8{2M 2 + tM /2) ~ 8{2M 2 + tM ) ' 
Thus, (|OSD is a little bit stronger than (l6~2ll) . 

Yurinskii's proof relies on martingale arguments, while our proof is a direct consequence 
of the tensorization mechanism. 
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7. Large deviations and T-inequalities. Abstract results 
The framework is the same as in Section See in particular Remark 15.41 

7.1. A deviation function is a transportation function. In this section, we give a 
rigorous proof at Theorem l7.1l of the Recipe for an increasing deviation function which 
may possibly be not convex. This extends Proposition 15.51 

Theorem 7.1. Let us assume \3. 5\) and hS.fy) . 

(a) Any deviation function is a transportation function. 

(b) If in addition T is continuous on TV, then the converse also holds: any trans- 
portation function is a deviation function. 

Proof, (a) As T is lower semicontinuous, for all t > the set {v G TV; T(v) > t} is 
open. It follows with the LD lower bound that 

-inf{#(z/ | fj);u G V T ,T{y) > t} < lim inf - log P(T (L n ) > t) 

Let a be any deviation function: for all t > 0, limsup^^ - logP(T(L n ) > t) < —a(t). 
Hence we obtain a(t) < inf {H(v | //); v G Vf,T(v) > t} so that a(t — 5) < H{y \ \x) for 
all v G Vjr and 5 > such that T{y) > t — 5. Taking t = T{y) leads us to a(T(z/) — 8) < 
H(v | /i) for all v G Vjr and 5 > 0. As a is increasing and 5 > is arbitrary, we have 
a(T(v)~) < H(u | /i). The desired result follows from the assumed left continuity of a. 

(b) As T is continuous, because of the contraction principle, {T(L n )} obeys the LDP 
with rate function i(t) = mf{H(v | [i)]v G V^,T(u) = t},t > 0. In particular, the LD 
upper bound: limsup^^ ^logP(T(L n ) > t) < — inf{i(s); s > t}, is satisfied. 
Let a be a transportation function. It clearly satisfies a(t) < mi{H(u \ //); v G TV, T{v) = 
t} for all t. That is: a < i. Finally, for all t > 0, 

limsup-logP(T(L„) > t) < -mii(s) 

n — >oo Tl s —t 

< — mi a(s) 

s>t 

= -a(t) 

where the last equality holds because a is increasing. This means that a is a deviation 
function. □ 

Remarks. 

• Note that we didn't use the specific form ()2.4j) of T, but only its lower semiconti- 
nuity. 

• Similarly, we didn't use the specific properties of the relative entropy, but only 
that it is a LDP rate function for {L n }. 

• Statement (b) will not be used later, but it is satisfactory to know that a trans- 
portation function is not far from being a deviation function. A natural situa- 
tion where T is continuous appears with c = d p since the Wasserstein's metric 
T^J P metrizes cr(TV,jF) with JF the space of all continuous functions (p such that 
\(fi(x)\ < c(l + d(x a , x) p ), Wx for some constant c, see ([H], Chapter 7). 
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7.2. The transportation function With Theorem 17.11 in hand, it is enough to 
compute a deviation function a to obtain the TCI 

< H(u | fi), \Ju G V T (7.2) 

But these functions may be rather hard to compute because of the sup in the definition 
(Q of 

T n = T(L n ) = sup {(</?, L n ) + (ip,/j)}. 

However, it is shown at Theorem I7. 71 below, that more can be said about transportation 
functions. 



Assumptions (A). The following requirements are assumed to hold. 
(i) We assume \3. ty) : 

e^ dji < oo, Wip G T . 



x 

(ii) We assume iff. 6]) : 

(0,0)e$cJxf, 

(iii) For a// (ip, <p>) G $, V + ¥ < 0. 

Requirement (iii) always holds in the norm case: $ = $[/, and it holds in the transporta- 
tion case $ = $ c if c(x,x) = 0, Va; G A\ 
Let us define 

A(<p) := log / e v ci/i. 
j x 

Proposition 7.3. Under the assumption h8. ,5j) 

(a) {L n } o6eys t/te LDP m TV with the rate function 

H{u | fj) = A» = sup{(^, v) - A(y?)}, v G V T . (7.4) 

(b) and for all (ip, <p) G $, {Tjf ,¥, } n >i o&eys the LDP in R i/ie rate function 

Jil>,<p(t) — sup{st — A(s</?) — s(V>, /i)}, t G E. 

Proof. Statement (a) is Theorem 13.21 

The function J^ tip is the convex conjugate of 

A^(s) := A(sip) + s(ip,fi), sGi 

Since A^ iV is a steep function under assumptions (ii) and (iii), (b) is a direct consequence 
of Gartner-Ellis theorem. □ 

We know that J^ )(p is convex with a minimum value attained at A^(0). Under assump- 
tion (iii), we have (0) = (ip + ip, /i) < 0. Therefore, J$ jtp is an increasing nonnegative 

function on [0, oo) and so are J$ and J$ given by 

J*(t) := J$(t"),t>0 where (7.5) 

J*(t) := inf J^(t) G [0,oo],t > 

(V , ,i(3)e<i > 
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with J$(0) = 0. This last equality follows from assumption (ii). As A^, (0) < 0, it also 
holds that for all t > 0, J^if) = Aj v (t) := sup s>0 {st — A^^^s)} where the sup is taken 
over s > rather than s G R. It follows that one can equivalently define J$ as follows. 

Definition 7.6 (of the functions J$ and J). . 

• J$ is the left continuous version of the increasing function 

t G [0, oo) i— > inf sup{st — A(s<p) — s(ip, /i)} G [0, oo]. 

(y>,v)6* s > 

• J is the best transportation function. Clearly, it is the left continuous function of 
the increasing function 

t G [0,oo) i-> mi{H(v \ fi); v G TV : T(u) > t} G [0,oo]. 

Although the best transportation function J might be out of reach in many situations, 
we have the following reassuring result. 

Theorem 7.7. Suppose that Assumptions (A) hold. Then, 7$ is a transportation function 
and the best transportation function in the class C is the convex lower semicontinuous 
regularization of J$ . 

Proof. This statement is a collection of the statements of Theorem 17.81 -a and Corollary 
17. lH -a.b which will be proved below. □ 

Theorem 7.8. Suppose that Assumptions (A) hold. 

(a) Then, J$ is a transportation function for T and {L n }. This can be equivalently 
rewritten as the following TCI 

J 9 (T(v)) < H(u | y), W G V T . 

(b) If in addition T is continuous on TV, then J$ is the best transportation function. 
It is also the best deviation function: This means that J$ = J. 

Proof, (a) As v i— > (if, v) + (ip, fi) is continuous, it follows from the contraction principle 
that Jip, v (t) = inf {H(v \ y); u G TV, (<p, v) + (tp, /i) = t} for all t > 0. Hence, J^ i¥ ,((</?, u) + 
(^i y)) — H(y | /i) for all v G Pjr and a fortiori 

J*((<p, z/) + (ip, y}) < H(u | n), 

as soon as (<p, v) + ytx) > 0. As J$ is increasing, by the definition ()2.4|) of T{y), one 
obtains: J$(T(z/) _ ) < if(z/ | /i) which is the desired result. Note that T(u) > since 
(0,0) G $ (assumption (A.ii)). 

(b) Because of part (b) of Theorem 17.11 it is enough to prove that J$ = J. Because 
of part (a) of the present theorem, J$ is a transportation function, and by part (b) of 
Theorem 17.11 it is also a deviation function. Therefore, J# < J and it remains to prove 
that J < J$. 
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By the LD lower bound for {T^}, for all t > 0, 

- inf Jr) < lim inf - log P(T_f > v > t) 

r>t n-^oo fl 

< lim sup- log P ( sup T*&>t 

< ~J(t). 

Since is increasing, we have: J(t) < inf r>t J$ iV {r) = J^ )t p(t + ), so that for all t > 

J(t)< inf {J^(t+),(^)e$} 
= inf inf JAu) 

(j) u>t 

= inf inf J^{u) 

u>t <j> 

= Mt + ). 

As J and J$ are increasing and J is left continuous, this gives J{t) < J$(£~) for alH > 
which is the desired result. □ 

7.3. Connections with Theorem 13.71 Let us first give an alternative proof of criterion 
(b) =>- (a) of Theorem 13.71 

We keep the Assumptions (A) of Section 17.21 Note that because of Assumptions (A.ii) 
and (A.iii), the function 

A$(s) := sup A^ >lf (s) = sup {A(s<p) + s(ip,fj,)}, s>0 (7.9) 

(»/>,¥>)£* (i/),¥j)G* 

is in the class C. It follows that its monotone conjugate 

A® (t) = sup{st - A«(s)},t > 

s>0 

is also in C. Thanks to formula (|7.5|) . for all t > 0, we have 

Af (t) < sup < si - sup A v , i¥ ,(s) 

= sup inf {st — A^ >(p (s)} 

< inf sup{st — A^, ^(s)} 

0/>,¥>)e$ s >o 

= ■/*(*) 

But Af (t) is left continuous, hence 

Af < J*. (7.10) 
As J$ is a transportation function (Theorem I7.8|) . so is Af. 

The criterion (b) =>■ (a) of Theorem 13. 71 follows from the above considerations. Indeed, (b) 
states that A$ < a®. Therefore, with (j7.10J) : a < Af < J$. Hence, a is a transportation 
function. 

An easy consequence of Theorem 13.71 is the following 
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Corollary 7.11. Suppose that Assumptions (A) hold. 

(a) The best transportation function in the class C is Af . This means that a G C is a 
transportation function if and only if a < Af . 

(b) Moreover, Af is the convex lower semicontinuous regularization of J$ (in restric- 
tion to t G [0, oo) ). 

(c) If T is continuous, then Af is also the best deviation function in the class C. 

Proof. The best function a® G C satisfying (b) of Theorem 13.71 is a® = A$, see f!7.9|) . 
Because of the equivalence (a) <^> (b) of Theorem 13.71 its monotone conjugate Af is the 
best transportation function in C. This is (a). 

Let us prove (b). In order to work with usual convex conjugates, let us state J<f>(t) = +oo 
for all t < and G $. We have 

(inf J^)*(s) = sup{st — inf </</>(£)} 

<t> t <t> 

= sup{st - J<t,(t)} 

t,<f> 

= sup sup {st — J^it)} 

4> t 

= supj;(s). 

Hence, the convex lower semicontinuous regularization of J$ := inf^ is (inf^, J^)** = 
(sup^, J^)* = (sup^A")* But, the convex lower semicontinuous regularization of sup^A^ 
is sup^A^*. Therefore, J|* = (sup^A^*)* = (sup^A^,)* = A^. But it is already seen that 
in restriction to t G [0, oo), A$ is in C, so that A%(t) = Af (t) for all t > 0. 
Finally, (c) is a direct consequence of (b) and Theorem 17. 81 (b). □ 
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